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ABSTRACT 

The autonomous transcription of integrated 
retroviruses strongly depends on genetic and epi- 
genetic effects of the chromatin at the site of inte- 
gration. These effects are mostly suppressive and 
proviral activity can be finally silenced by mechan- 
isms, such as DNA methylation and histone modifi- 
cations. To address the role of the integration site 
at the whole-genome-scale, we performed clonal 
analysis of provirus silencing with an avian 
leucosis/sarcoma virus-based reporter vector and 
correlated the transcriptional silencing with the 
epigenomic landscape of respective integrations. 
We demonstrate efficient provirus silencing in 
human HCT116 cell line, which is strongly but not 
absolutely dependent on the de novo DNA 
methyltransferase activity, particularly of Dnmt3b. 
Proviruses integrated close to the transcription 
start sites of active genes into the regions 
enriched in H3K4 trimethylation display long-term 
stability of expression and are resistant to the tran- 
scriptional silencing after over-expression of 
Dnmt3a or Dnmt3b. In contrast, proviruses in the 
intergenic regions tend to spontaneous transcrip- 
tional silencing even in Dnmt3a~'~ Dnmt3b~'~ 
cells. The silencing of proviruses within genes is 
accompanied with DNA methylation of long 
terminal repeats, whereas silencing in intergenic 
regions is DNA methylation-independent. These 
findings indicate that the epigenomic features of 
integration sites are crucial for their permissivity to 
the proviral expression. 



INTRODUCTION 

In the course of retrovirus infection, the integration of 
proviral DNA and its subsequent transcription into viral 
mRNAs are important steps, when the host cell regulatory 
mechanisms interfere with virus propagation. The host- 
cell control of provirus transcription can eliminate the 
deleterious effects of retroviruses but, on the other hand, 
it has to be taken into account in retrovirus-mediated gene 
transfer, transgenesis, and gene therapy where stable and 
long-term provirus expression is required. The cellular 
DNA sequences adjacent to the integrated retrovirus can 
influence the proviral transcriptional activity. In general, 
transcriptionally active regions are permissive for virus- 
gene expression while integration into heterochromatin 
dis-favors virus transcriptional activity (1,2). 

Multiple studies analyzed retrovirus integration 
patterns at the genome-wide scale and revealed virus- 
specific differences in integration preferences. Human 
immunodeficiency virus type 1 (HIV-1) preferentially 
targets transcriptionally active genes and, correspond- 
ingly, gene-rich and GC-rich chromosomal regions (3,4). 
Along the transcription units (TU), there is no preference 
either for introns, exons, CpG dinucleotides (CpGs) 
islands, or transcription start sites (TSSs) (5). This 
integration pattern is directed by the tethering of 
LEDGF/p75 with HIV-1 integrase and open chromatin 
components (6-8). In striking contrast, integrations of 
gamma-retroviruses and spuma-retroviruses are 
over-represented around TSSs and CpG islands (5,9,10), 
which might be the cause of documented genotoxicity and 
leukemia induction by a murine leukemia virus (MLV)- 
derived vector in a gene therapy trial (11). Cellular 
factor(s) channeling MLV to integrate close to TSSs are 
not known, although several transcription factors and 
chromatin-associated proteins interacting with MLV 
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integrase are good candidates (12). Avian sarcoma and 
leukosis viruses (ASLV) display weak preferences for 
TUs but not for TSSs (5,13,14), and mouse mam- 
mary tumor virus integrates randomly across the host 
genome (15). 

Only few studies describe non-random sets of integra- 
tion sites with either transcriptionally active or silenced 
pro viruses. For example, Lewinski et al. (16) separated 
cells infected with an HIV-based reporter vector into 
populations with stable provirus expression and with 
proviruses whose expression depended on the stimulation 
by TNFa. Both populations showed similar over- 
representation of integration sites in genes, but proviruses 
with TNFa-dependent activity were more frequently 
found in centromeric alphoid repeats, in long-intergenic 
regions, and in very highly expressed genes (16). 
Similarly, the transcriptional interference was observed 
in an in vitro model of HIV-1 latency where most latent 
proviruses integrated in introns of highly transcribed 
genes with a modest preference for the same orientation 
as the host gene (17). Second, proviruses in tumors 
induced by Rous sarcoma virus (RSV)-derived vectors 
(18) represent transcriptionally active copies and 
accumulated in TUs, CpG islands, and around TSSs. 
Most strikingly, almost all genie integrations were found 
in the genes expressed in multiple tissues, whereas 
tissue-specifically expressed genes were avoided. Both 
studies pointed to some chromosomal features promoting 
or repressing the integrated proviruses but exact analysis 
of individually characterized proviruses is lacking. 

Transcriptional provirus silencing was described in 
many experimental settings and multiple suppressive 
mechanisms evolved probably as a protection from 
the deleterious outcomes of retrovirus infection and mo- 
bilization of endogenous retroviruses. For example, 
the zinc finger protein ZFP809 of the Kruppel-associated 
box (KRAB) family together with the transcrip- 
tional co-repressor KRAB-associated protein 1 (KAP-1/ 
Trim28) bind in a sequence-specific manner the repressor- 
binding site present in the primer-binding site of MLV 
(19-21). This binding explains the potent silencing of 
MLV in murine embryonic stem cells (22,23) and the 
release of silencing of the murine stem cell virus, which 
evolved different primer-binding specificity (24,25). 

The executive mechanisms of transcriptional 
silencing include proviral de novo DNA methylation and 
marking the provirus-associated nucleosomes by repres- 
sive histone modifications. DNA methylation of long 
terminal repeats (LTRs) was demonstrated to accompany 
the silenced MLV (26-29), Rous sarcoma virus (30), 
HIV-1 (31-33), HTLV-1 (34,35), and various families of 
human endogenous retroviruses (36-39). Furthermore, 
mutation of CpGs within the retroviral LTRs reduces 
provirus silencing (40), and insertion of a CpG island 
core sequence into or upstream to the 5'LTR is an efficient 
anti-silencing strategy (41,42). On the other hand, 
provirus silencing occurs even in cells deficient in 
de novo DNA methyltransferases Dnmt3a/b (29,43), and 
DNA methylation is dispensable for the silencing in em- 
bryonic stem cells (44). These facts point to the repressive 
histone marks as an alternative mechanism of provirus 



silencing. Particularly, di- or tri-methylation of the 
H3K9 by lysine methyltransferases G9a and Eset has 
been correlated with transcriptional repression of newly 
integrated and endogenous retroviruses (45^18). Recent 
siRNA-based knock-down screen identified a handful of 
epigenetic factors participating in a non-redundant 
silencing network in HeLa cells (49). 

Taken together, the interplay of major suppressive 
factors in establishment and maintaining the silent 
provirus remains to be clarified. We suggest here that 
clonal analysis of the silencing of individual proviruses 
in context with their chromatin environment and chromo- 
somal positions are urgently needed for this purpose. 
To better understand the role of de novo DNA methyl- 
transferases in the silencing process, we compared the ex- 
pression of individual proviruses in cells with intact or 
deleted DNA methyltransferase genes. In this study, we 
found that only a defined subset of provirus integrations is 
fully resistant to transcription silencing and prone to the 
long-term expression of transduced genes. 

MATERIALS AND METHODS 

Construction of the retrovirus vector 

We constructed the pAG3 replication-defective reporter 
retrovirus vector by replacement of the gag, pol, and env 
genes in the replication-competent vector RCASBP(A) 
(50) with the GFP-coding sequence. pRACSBP(A) was 
amplified with primers RV3-ClaI(2) and RV3-R2 
(Supplementary Table SI), which span from 3'UTR 
across the plasmid backbone to position +634 in the 
gag, and the product was self-ligated. The gag initiation 
ATG codon and the inner gag ATG codon 120 were des- 
troyed by introduction of point mutations using the 
Transformer site-directed mutagenesis kit (Clontech). 
Mutagenesis was performed according to the manufac- 
turer's protocol with mutagenic primers mutATGgag 
and RV3-mTAG, and selection primers select Pstl/SacII 
and select Scal/Bglll (Supplementary Table SI) for selec- 
tion with PstI or Seal restriction enzymes, respectively. 
The resulting construct pRV3 represents the vector 
backbone comprising ASLV LTRs and necessary 
packaging sequences. The linker from adaptor plasmid 
pClal2 (50) was cloned into the unique Clal restriction 
site of the pRV3 vector. The EGFP coding sequence was 
then cloned from the plasmid pEGFP (Clontech) via Xbal 
restriction sites in the Clal 2 linker and the resulting retro- 
viral vector pAG3 was used for the virus propagation. 

Cell culture and virus propagation 

The packaging AviPack cell line (18) was maintained in 
D-MEM/F12 Eagle's modified medium (Sigma) supple- 
mented with 5% of newborn calf serum, 5% of fetal calf 
serum, 1% of chicken serum (all Gibco BRL), and peni- 
cillin/streptomycin (lOOmg/ml each, Sigma) in a 3% C0 2 
atmosphere at 37°C. HCT116 human colorectal carcin- 
oma cell line and its subclones with knock-outs of 
DNA methyltransferases HCT116 Dnrntr 1 ', HCT116 
Dnmtr h Dnmt3b~'~, HCT116 Dnmt3b~'~, and 
HCT116 Dnmt3a~ ! ~ Dnmt3b~'~ (51-53) were obtained 
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from Bert Vogelstein, Johns Hopkins University School of 
Medicine, Baltimore, Maryland, and maintained in the 
same conditions except for supplementation with chicken 
serum. The AviPack packaging system was utilized for the 
virus propagation and pseudo-typing with vesicular sto- 
matitis virus protein G (VSV-G) as described in (18). 
Briefly, 10 7 AviPack cells plated on a 150 mm Petri dish 
were cultured and co-transfected with 50 ug of pAG3 and 
10 ug of p VSV-G (Clontech) plasmids by calcium phos- 
phate precipitation after 24 h. The fresh cultivation 
medium was supplemented with lOOmM glucose 24 h 
post-transfection (p.t.) and collected twice 48 h and 72 h 
p.t. Obtained viral stocks were clarified by centrifugation 
at 200 x g for lOmin at 4°C, supernatants were collected 
and centrifuged at 23 000rpm for 150min at 4°C in rotor 
SW28, Beckman OptimalOO (Beckman). The pellet was 
resuspended in a culture medium containing 10% 
newborn calf serum, frozen, and stored in — 80°C. 
Titration of the infectious virus stock was performed by 
its serial dilution and subsequent infection of DF-1 cells. 
Two days post-infection (p.i.), the number of 
GFP-positive cells or cell clusters was counted. The 
titrated stock was used for infection of HCT116 cells. 

Infection and subcloning of HCT116 cells 

We plated 10 6 cells of the wild-type (wt) HCT1 16 cell line 
and its DNA methyltransferase-deficient derivatives per 
100 mm Petri dishes and infected them with the AG3 
replication-deficient retroviral vector at multiplicity of 
infection (MOI) 0.02 24 h after plating. Virus AG3 was 
passed through 0.2 um SFCA filter (Corning) and 600 ul 
of the suspension was applied per dish and allowed to 
adsorb for 40min at room temperature. After adsorption, 
12 ml of fresh medium was added and cells were cultured 
at 37°C and 3% C0 2 . Three to six days p.i., the percentage 
of GFP-positive cells was analyzed by flow cytometry, and 
GFP-positive cells were sorted in single-cell sort mode 
with FACSVantage SE (Becton-Dickinson) into 96-well 
tissue culture plates to obtain single-cell clones. 
Expanded clones were sub-cultured and percentages of 
GFP-positive cells were assessed in one-week intervals 
with the LSR II cytometer (Becton-Dickinson). 

Over-expression of Dnmt3a and Dnmt3b 

Vectors for ectopic over-expression of de novo DNA 
methyltransferases were created from pVitro expression 
construct (Invivogene) by replacement of the GFP 
coding sequence with tdTomato fluorescence marker in 
the first expression cassette to allow tracking of efficiently 
transfected cells. This vector was denoted pVitroT and 
was used as a mock-transfection control. The second ex- 
pression cassette of the pVitroT was used for insertions of 
the particular de novo DNA methyltransferase coding 
sequence. The Dnmt3a and Dnmt3b molecular clones 
were obtained from IMAGE Consortium Library. The 
DNA methyltransferase-coding regions were amplified 
by Pfu DNA polymerase (Promega) with primers 
hDNMT3A-BglII-L and hDNMT3A-NheI-R (Dnmt3a) 
or hDNMT3B-BglII-L and hDNMT3B-NheI-TAG 
(Dnmt3b) (Supplementary Table SI). The PCR products 



were cloned into the pGEM-T Easy vector (Promega) and 
subsequently inserted into the pVitroT vector as BglH 
Nhel fragments. The resulting pVitro3AT and 
pVitro3BT plasmids were sequenced to exclude occurrence 
of point mutations. The transfection was performed with 
Fugene HD-transfection reagent (Roche) according to the 
manufacturer's protocol. The cells were then cultivated for 
7 days in order to manifest potential changes in provirus 
expression and DNA methylation pattern, and expression 
of GFP in the transfected tdTomato-positive cells was 
analyzed with the LSR II cytometer (Becton-Dickinson). 

To quantify the level of Dnmt3a and Dnmt3b expres- 
sion, the transfected wt HCT1 16 cell culture was harvested 
on day 4 p.t., and tdTomato-positive cells were sorted by 
FACSVantage SE (Becton-Dickinson). Total RNA was 
isolated from the collected cells using RNAzol RT 
(Molecular Research Center, Inc.). One microgram of the 
isolated RNA was treated for 20min with Dnase I (New 
England Biolabs). Dnase I-treated RNA samples were sub- 
jected to reverse transcription using M-MLV reverse tran- 
scriptase (Promega) and oligo dT primers in 50 ul reaction 
volume. One microliter of the resulting cDNA was used for 
the triplicate qPCR using the MESA GREEN qPCR 
MasterMix Plus for SYBR Assay Kit (Eurogentec) in a 
total volume of 20 ul with 400 nM concentration of 
primers. We used the following primers designed against 
human DNA methyltransferases: hDNMT3a-FW and 
hDNMT3A-RV designed for exons 20-22 of human 
Dnmt3a, hDNMT3B-FW and hDNMT3b-RV designed 
for exons 19-20 of human Dnmt3b (Supplementary 
Table SI). The size of the PCR product was 200 bp in 
both cases. The RNA polymerase Ha (POLR2a) amplified 
with primers POLR2a-FW and POLR2a-RV 
(Supplementary Table SI) was used as a reference house- 
keeping gene. The calibration curves were set according 
to the amplification of cDNA of the following genes: 
Dnmt3a, Dnmt3b, and housekeeping gene POLR2a. 
These products were cloned into the pGEM-T Easy 
vector (Promega) and 10-fold diluted in the range 10-10 7 
molecules per one RT reaction. PCR reactions were run 
for 40 cycles in the Chromo4 system for RT-PCR 
thermocycler (Bio-Rad) with an annealing temperature of 
60°C. The reaction products were resolved on 2% agarose 
gel. The results were normalized to 10 5 molecules of 
POLR2a. 

Reactivation of silenced proviruses by Dnmt and histone 
deacetylase (HDAC) inhibitors 

Each clone was split into four wells and separately treated 
with reactivation agents. The culture medium was supple- 
mented with 4uM 5-azacytidine (5-azaC, Sigma) and 
2mM sodium butyrate (Sigma), alone or in combination. 
The inhibitor concentrations used for the reactivation 
were titrated and consequently, set as a compromise 
between reactivation efficiency and minimum toxicity. 
The clones were treated for 2 days and subsequently col- 
lected and analyzed by flow cytometry. Prolonged treat- 
ment did not lead to stronger reactivation but more 
distinctive cell toxicity. 
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Analysis of DNA methylation by bisulfite sequencing 

The genomic DNA isolated by phenol-chloroform extrac- 
tion from the infected cells was treated with sodium 
bisulfite using the EpiTect bisulfite kit (Qiagen, Hilden, 
Germany) according to the manufacturer's protocol. The 
nested PCR of the upper strand was performed with 
bisRV-LTR-LO, bisRV-LTR2-L, bisRV-LTR2-Router, 
and bisRV-LTR2-Rinner primers complementary to the 
U3 region of the ASLV LTR and the leader region en- 
compassing all but one CpG within the LTR 
(Supplementary Table SI). PCR reactions were carried 
out with 200 ng of bisulfated DNA by 35 cycles of 95° C 
for 30s, 58°C for 2min, and 72°C for 90s. The PCR 
products were cloned into the pGEM-T Easy vector 
(Promega) and sequenced by the universal pUC/M13 
forward primer. 

Cloning and sequencing of provirus integration sites 

The provirus-cell DNA junction sequences were amplified 
using the splinkerette-PCR method (54). The genomic 
DNA was isolated by phenol-chloroform extraction 
from individual clones and cleaved with either of subse- 
quent restriction enzymes Sau3AI, DpnII, or Msel. The 
restriction fragments were ligated overnight at 15°C with a 
10-fold molar excess of adaptors formed by the annealing 
of HMspAa and HMspBb-Sau3AI or HMspBb-Msel 
oligonucleotides (Supplementary Table SI) complemen- 
tary to the particular cleavage site of the enzyme used 
for the DNA digestion. The ligation products were subse- 
quently cleaved with Bsu36I to destroy undesirable 
products of adaptor ligation to the 3'LTRs. The resulting 
mixture of fragments was then purified by phenol-chloro- 
form extraction and used as a template for nested-PCR 
reaction with primers specific for the retrovirus LTR and 
the splinkerette adaptor (Supplementary Table SI). 
Primary PCR was performed with primers Splinkl and 
spPCR-AG3-R as follows: 94°C for 3min, 2 cycles of 
94°C 15 s, 68°C 30 s, 72°C 2min and 31 cycles of 94°C 
15 s, 62° C 30 s, 72° C 2min, and final polymerization 
72°C for 5min. The secondary PCR used primers 
Splink2 and spinPCR-AG3-R with program setting: 
94°C 3min, 30 cycles 94°C 15 s, 60°C 30 s, 72°C 2min, 
and final 72°C 5min. The specific PCR products were 
sequenced and the resulting sequences adjacent to the 
5'LTR were aligned to the Human Genome assembly 
version hgl9. 

Genomic analysis of provirus integration sites 

All junction sequences containing the end of 5'LTR and 
the unique cellular DNA sequence at least 30 bp in length 
were used for more detailed analysis. The sequences of 
the integration sites were mapped onto the annotated 
human-genome assembly hgl9 of February 2009 
(GRCh37/hgl9) using BLAT. The genes/transcription 
units hit by the provirus integration were identified ac- 
cording to the UCSC Genes track using the University 
of California at Santa Cruz browser available at http:// 
genome.ucsc.edu/cgi-bin/hgGateway. The UCSC Genes 
track shows gene predictions based on the data from 



RefSeq, Genbank, CCDS, and UniProt. The distance of 
the integration site from the transcription start site was 
measured according to the SwitchGear Genomics 
Transcription Start Sites database. Identification of the 
CpG islands was done based on the GRCh37/hgl9 
assembly available at the UCSC Genome Browser. The 
H3K4me3 histone modification data of the HCT116 cell 
line obtained by ChlP-seq were produced by the 
ENCODE project at University of Washington and are 
accessible through the ENCODE June 2010 Freeze 
(http://genome.ucsc.edu/ENCODE/). The same source 
provided data on gene transcription level of targeted 
TUs (Affymetrix Exon Array, ENCODE/University of 
Washington). 

RESULTS 

De novo DNA methylation is required for efficient ASLV 
provirus silencing 

To examine the role of DNA methyltransferases in tran- 
scriptional repression of ASLV-derived vectors newly 
integrated in the human genome, we infected wt 
HCT116 tumor cells and, in parallel, their derivatives 
with single or combined knock-outs of Dnmtl, Dnmt3a, 
and Dnmt3b with an ASLV-based vector, AG3. The AG3 
vector transduces green-fluorescent protein (GFP) driven 
by ASLV LTR (Figure 1A). AG3 is replication-deficient, 
which, together with very low MOI, ensures that each 
infected cell contains only one provirus integrated in a 
distinct site of the host genome. Three to six days p.i., 
the GFP-positive cell clones were single-cell sorted by 
flow cytometry, and single cell clones representing individ- 
ual sites of provirus integration were established and 
expanded. In this way, we omit the proviruses that have 
already been silenced immediately after integration. We 
isolated 73, 23, 56, 17, and 82 clones of GFP-positive wt 
HCT116, HCT116 Dnmtr 1 ', HCT116 Dnmt3b~ l ~, 
HCT116 Dnmtr'- Dnmt3b~ h ", and HCT116 Dnmt3a~ ! 

Dnmt3b~'~, respectively, and followed the stability of 
the provirus expression for up to 4 months (Figure IB). 
We observed a striking difference in provirus silencing 
between the wt and Dnmtl~ l ~ HCT116 cells on the one 
hand and de novo methyltransferase-deficient HCT116 
cells on the other hand (Figure 2). In wt HCT116 cells, 
we found 46 out of 73 clones strongly silenced with 0-5% 
of GFP-positive cells and only eight clones displaying no 
or very weak silencing with 80-100% of GFP-positive 
cells 60 days p.i. (Figure 2A). Similarly, the majority of 
clones were strongly silenced in HCT116 Dnmtl~ ! ~ cells 
(Figure 2B). In contrast, among the clones of de novo 
DNA methyltransferase-deficient cells, about half of the 
clones exhibited weak or zero silencing and only rare 
clones displayed strong silencing with 0-5% of 
GFP-positive cells 60 days p.i. (Figure 2C-E). The 
dynamics of silencing is shown by percentages of 
GFP-positive cells in a representative subset of clones 
derived from wt HCT116 and HCT116 Dnmt3a~ l ~ 
Dnmt3b~ / ~ cells at the end of fourth and eighth week 
p.i. (Figure 2F and G). The vast majority of wt HCT116 
clones were largely silenced already in the fourth week p.i. 
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Figure 1. Experimental schema. (A) Schematic representation of the AG3 retrovirus vector. (B) Schema of HCT116 cell infection, establishment 
of single-cell clones and clonal analysis of provirus silencing. White and grey cells represent the GFP-negative and GFP-positive cells, respectively, 
and v|/, encapsidation signal. 
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Figure 2. Clonal analysis of AG3 provirus silencing. Individual clones of infected GFP-positive wt HCT116 (A), HCT116 Dnmtl ' (B), HCT116 
Dnmt3b~ h (C), HCT116 Dnmtr'~ Dnmt3b~ h (D), and HCT116 Dnmt3cT h Dnmt3b H ~ (E) cells were examined for the provirus silencing 60 days 
p.i. and subdivided into five categories according to the percentage of GFP-positive cells. The categories are defined arbitrarily and the interval 0-5% 
means extremely strong silencing, whereas 80-100% means zero or very weak silencing. Data are presented as percentages of clones falling into the 
defined categories. Stability in time of the provirus expression in individual clones of wt HCT116 (F) and HCT116 Dnmt3o~^~ Dnmt3b~'~ (G) is 
shown as the percentage of GFP-positive cells 26 (black columns) and 58 (gray columns) days p.i. Representative sets of clones are shown. 
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and only rare clones retained the GFP expression un- 
affected. In HCT116 Dnmt3a~'~ Dnmt3b~'~ cells, there 
were numerous clones with a stably high percentage of 
GFP-positive cells and no detectable progress to silencing. 
Clones subjected to a certain degree of silencing repre- 
sented approximately one half of the clones. We 
conclude that de novo DNA methyltransferase activity is 
important for efficient provirus silencing and the absence 
of Dnmt3b alone and especially in combination with 
Dnmt3a increases the probability of long-term and 
unsilenced provirus expression. The absence of mainten- 
ance methyltransferase Dnmtl did not significantly 
alleviate provirus silencing. In any case, exceptional 
clones keep stable provirus expression even in the 
presence of de novo DNA methyltransferases and, vice 
versa, multiple clones tend to the silencing even in their 
absence. This behavior might be caused by genomic and 
epigenomic features of the respective sites of proviral 
integration. 



Rescue of the silencing by expression of de novo DNA 
methyltransferases 

To confirm that the absence of de novo DNA methyl- 
transferases is specifically responsible for the inefficient 
provirus silencing, we conducted a rescue experiment 
with ectopic over-expression of cloned human Dnmt3a 
or Dnmt3b. Clones of the HCT116 Dnmt3a~ l ~ 
Dnmt3b~ ! ~ "cells with stable and non-silenced provirus ex- 
pression were separately transfected with vectors 
pVitro3AT and pVitro3BT expressing the human DNA 
methyltransferases Dnmt3a and Dnmt3b, respectively, 
and the control vector pVitroT. The levels of Dnmt3a 
and Dnmt3b ectopic expression were comparable as 
assayed by quantitative reverse-transcriptase polymerase 
chain reaction (qRT-PCR) in an independent transfection 
experiment (data not shown). Seven days p.t. of the 
de novo DNA methyltransferase constructs, the majority 
of clones were silenced at least to some extent and the 
percentage of GFP-positive cells dropped substantially 
in multiple clones (Figure 3). The vast majority of clones 
exhibited more extensive loss of GFP-positive cells after 
the Dnmt3b transfection or, rarely, the effect of Dnmt3b 
and Dnmt3a was equal (Figure 3). Although the differ- 
ences in rescue efficiency between Dnmt3b and Dnmt3a 
are small in several clones, the uniformity of the trend 
suggests that Dnmt3b is a more potent silencer of 
proviruses. Only a small fraction of clones were resistant 
to the de novo DNA methyltransferase over-expression 
with not affected percentage of GFP-positive cells 
(Figure 3). The frequency of these Dnmt3a/b-resistant 
clones corresponded approximately to the frequency of 
stable clones in the wt HCT116 cells. We therefore 
tested the Dnmt3a/b resistance in the stable clones of wt 
HCT116 cells in an analogous rescue experiment. All 
seven tested stable clones of wt HCT116 cells turned out 
to be resistant to Dnmt3a/b over-expression (data not 
shown), whereas clones with weak and slow silencing 
were sensitive and displayed substantial loss of GFP ex- 
pression upon Dnmt3a/b transfection (Supplementary 



Figure SI). This suggests again that this stability of 
provirus expression even in the presence of de novo 
DNA methyltransferases is a result of integration into 
specific target sites, and the probability of such integration 
is the same in wt and de novo DNA methyltransferase- 
deficient HCT116 cells. 

CpG methylation-independent silencing in de novo DNA 
methyltransferase-deficient cells 

As the cells lacking Dnmt3a or Dnmt3b exhibited limited 
but still detectable provirus silencing, we examined the 
mechanism of this transcriptional suppression. First, we 
used 5-azaC and sodium butyrate, inhibitors of DNA 
methyltransferases and HDAC, respectively, to reactivate 
the GFP expression in clones of wt HCT116 and HCT116 
Dnmt3a~ / ~ Dnmt3b~ ! ~ cells with silenced proviruses. The 
silenced proviruses in the wt HCT116 cells were partially 
reactivated by the treatment with either 5-azaC or sodium 
butyrate (Figure 4A). The effect of sodium butyrate was 
more profound, and in combination of both drugs, the 
additive effect was observed in multiple clones. We 
observed quite a different situation in the silenced 
HCT116 Dnmt3a~ l ~ Dnmt3b~ / ~ clones (Figure 4B). The 
effect of 5-azaC was marginal, without any addition to 
that of sodium butyrate. In comparison, sodium 
butyrate alone reactivated provirus expression efficiently. 

Secondly, we examined the DNA methylation status of 
the promoter region of either active or silenced proviruses 
in the wt HCT116, HCT116 Dnmt3b~'~, and HCT116 
Dnmt3a~'~ Dnmt3b~ ! ~ cells. The region analyzed by the 
bisulphite sequencing spans from the 5'end of the 5'LTR 
to the position +200 in the leader sequence. Three repre- 
sentative examples of 5'LTR CpG methylation are shown 
in Figure 5. All 17 active proviruses tested showed 
unmethylated or only sporadically methylated 5'LTRs in 
wt as well as DNA methyltransferase-deficient cells 
(Figure 5A). In the majority (9 of 14) of silenced clones 
of wt HCT116 cells, the proviral 5'LTRs were heavily 
methylated (Figure 5B). This abundance of heavily 
methylated 5'LTRs slightly dropped among the silenced 
proviruses in the HCT1 16 Dnmt3b~'~ cells (not shown). In 
contrast, we did not find any significantly methylated 
provirus in the silenced clones of HCT116 Dnmt3a~ l ~ 
Dnmt3b~'~ cells (Figure 5B) and all proviruses integrated 
in these cells were unmethylated regardless of their tran- 
scriptional state. 

We analyzed the DNA methylation status of proviral 
LTRs in several clones subjected to the silencing rescue 
experiments. The analysis revealed that the loss of the 
GFP expression upon Dnmt3a/b transfection was 
accompanied by heavy DNA methylation of the proviral 
5'LTRs. The extent of the LTR CpG methylation 
correlated with the loss of GFP-positive cells. The 
over-expression of Dnmt3a led to a lower level of DNA 
methylation in comparison with Dnmt3b, which corres- 
ponded to the different efficiency of de novo DNA 
methyltransferases in the silencing rescue experiment (for 
details see the chapter 'The DNA methyltransferase- 
sensitive proviruses are integrated in the regions of 
methylated DNA' and Figure 8). 
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Figure 4. Reactivation of silenced proviruses in clones treated with inhibitors of DNA methyltransferases and HDAC. (A) wt HCT116 cell clones, 
(B) HCT116 Dnmt3a~^~ Dnmt3b~*~ cell clones. Representative selection of clones were treated with 5-azaC (white columns), sodium butyrate (gray 
columns), or a combination of both (dashed columns) and the percentage of GFP-positive cells was measured by flow cytometry after 2 days of the 
treatment. Black columns represent the percentages of GFP-positive cells in mock-treated cells. 



Epigenomic features of integration sites permissive for the 
stable proviral expression 

The low-frequency occurrence of clones with stable and 
Dnmt3a/b-resistant provirus expression, as well as our 
previous analysis of retrovirus integration sites in virus- 
induced sarcomas (18), suggested the importance of the 
integration site for either silencing or maintenance of 
provirus expression. We, therefore, characterized the 
integration sites of the proviruses from clones with 
stable versus silenced proviral expression by the 



splinkerette-PCR technique and BLATing against the 
human genomic assembly GRCh37, version hgl9. For 
the integration site analysis, we selected clones with 
stable GFP expression (95-100% GFP-positive-cells), 
mostly from the sets of clones described in Figure 2A 
and E but also from an additional set of clones derived 
from pre-selected GFP-positive cells. In total, we obtained 
113 unambiguously mapped provirus integration sites 
from wt HCT116 and HCT116 Dnmt3a~'~ Dnmt3b~ h 
cells (Table 1). The whole data set of integration sites 
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Figure 5. CpG methylation status of the 5'LTRs in wt HCT116 and HCT116 Dnmt3a~^~ Dnmt3b~'~ clones. Representative cell clones with 
non-silenced (A) and silenced (B) proviruses were chosen and CpG methylation was investigated by bisulfite sequencing. Methylated CpG dinucleo- 
tides are indicated by solid circles, non-methylated CpGs are indicated by open circles. Numbers indicate the percentages of GFP-positive cells in 
particular cell clones (boxed numbers) and the percentages of methylated CpG dinucleotides. 



together with characteristics of respective cell clones is 
given in Supplementary Table S2. Targeting the annotated 
human genes, either untranslated regions, intronic, or 
exonic parts, according to the USCS Genes track was 
regarded as integration into TU. 

The proviruses with stable expression were found to be 
integrated almost exclusively into TUs in both wt HCT1 16 
and HCT16 Dnmt3a~ j ~ Dnmt3b~ h cells (Table 1). 
However, the distribution of these integration sites along 
the whole TUs differed between the wt HCT116 and 
HCT116 Dnmt3a~ l ~ Dnmt3b~'~ cells (Figure 6). The 
seven stable proviruses in the wt HCT116 cells were uni- 
formly found close, not more than 2.7 kb downstream, 
to the TSSs. Even more striking was the absolute 
overlap of these integration sites with the regions 
enriched in lysine 4 trimethylation of histone 3 
(H3K4me3) as identified in the ENCODE project 
database. This chromatin modification is a hallmark of 
regions proximal to TSSs of transcriptionally active 
genes. In contrast, the stably expressed proviruses in the 
HCT116 Dnmt3a~'~ DnmtSb' 1 ' cells were distributed 
throughout the whole TUs without significant accumula- 
tion in the vicinity of the TSSs. Interestingly, integrations 
matching with the H3K4me3-rich regions of their respect- 
ive genes corresponded to the proviruses resistant to 
silencing rescue by Dnmt3a/b over-expression. An 
example of a TU targeted by provirus integration is 
given in Supplementary Figure S2. 

The distribution of integration sites in clones with 
unstable (silenced) provirus expression also differed 



between wt HCT116 and HCT116 Dnmt3a~'~ Dnmt3b~'~ 
cells. In the wt HCT116 clones, integration sites were 
found either within or outside of TUs. When in TUs, 
the integrations were found throughout the TUs except 
for the proximal gene regions enriched in H3K4me3. In 
HCT116 Dnmt3a~ l ~ Dnmt3b~ / ~ cells, the majority of 
silenced proviruses were detected outside of TUs and the 
intra-genic insertions were rare, confined to distal regions 
of extremely large TUs, 43-450 kb from the TSS. Of note, 
the provirus silencing was less efficient here. Moreover, 
the unstable proviruses closer than 100 kb to the TSS 
were in the antisense orientation to the transcription of 
the respective gene (Figure 6). 

The protective effect of TSS and the associated 
H3K4me3-rich region for the maintenance of long-term 
provirus expression was further highlighted in silencing 
rescue experiment (Figure 7). Proviruses integrated into 
genes in the HCT116 Dnmt3a~ ! ~ Dnmt3b~ ! ~ cells are 
stably expressed with no or only a negligible level of 
silencing even as far as 60 kb from the respective TSS 
(Figure 7A). Upon the Dnmt3b over-expression, 
however, the sensitivity of provirus expression to the 
Dnmt3b increased with the distance from the TSS, most 
strikingly within the first 20 kb (Figure 7B). Another 
factor affecting the silencing can be the provirus orienta- 
tion in relation to the transcription of the targeted gene. 
This is apparent from the distribution of proviruses 
integrated within genes. Among the silenced proviruses, 
both in wt HCT116 and in HCT116 Dnmt3a~ ! ~ 
Dnmt3b~ l ~ cells, there were approximately the same 
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numbers of sense and anti-sense integrations. The stable 
proviruses, however, tended to be in sense, particularly in 
wt HCT116 cells (Figure 6). The proviruses integrated in 
anti-sense orientation were more susceptible to silencing 
even in shorter distance from the TSS (Figure 7B). 



Altogether, provirus expression and silencing are inter- 
connected with transcription of the targeted genes and 
the H3K4me3-enriched regions are of particular import- 
ance for the protection from DNA methyltransferase- 
dependent silencing. 



Table 1. Overview of clones with characterized sites of provirus 
integration subdivided according to the cell line, stable provirus 
expression versus silencing, and localization in or outside TUs 



Cell line 




Provirus 
expression 
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insertions 
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~ Dnmt3b~ ! ~ 
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"In addition to the 33 clones with 95-100% of GFP-positive cells from 
the experiment described in Figure 2E, further 29 independent clones 
obtained from pre-selected GFP-positive HCT116 Dnmt3cr^ 
Dnmt3b~ , ~ cells were included into the integration site analysis. 



The DNA methyltransferase-sensitive proviruses are 
integrated in the regions of methylated DNA 

The aforementioned data raised the question about the 
interplay between the DNA methylation at the site of 
provirus integration and de novo DNA methylation of 
proviral regulatory sequences. There is a possibility that 
provirus integration into hypermethylated regions can de- 
termine the de novo DNA methylation and transcriptional 
silencing of the provirus. In order to answer this question, 
we analyzed the DNA methylation status within 300- 
600 bp of the genomic DNA adjacent to the proviral 
5'LTR in a number of representative clones of HCT116 
Z)«w?5a~ /_ Dnmt3b~ ! ~ cells. We found that stably ex- 
pressed and de novo DNA methyltransferase-resistant 
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Figure 6. Distribution of provirus integration sites along the targeted transcription units. Positions of provirus integration sites in cell clones with 
stable (non-silenced) provirus expression (A) and cell clones with unstable (silenced) provirus expression (B) in wt HCT116 and HCT116 Dnmt3a~^~ 
Dnmt3b~l~ cells are shown as vertical arrows in the absolute distance from the TSS up to 500 kbp. Proviruses proximal to the TSSs are shown in the 
enlarged 60 kbp regions below. The numbers of intergenic integrations are shown out of scale. Downward arrows, proviruses integrated in the same 
orientation as transcription of the targeted gene; upward arrows, proviruses integrated in antisense orientation. Grey areas, the maximum range of 
the H3K4me3-rich region. 
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Figure 7. Relation between the distance of provirus integration from TSS and the sensitivity to Dnmt3b over-expression. The level of silencing upon 
the Dnmt3b over-expression is shown in clones of HCT116 Dnmt3a^^ Dnmt3b~l~ cells. Cell clones with proviruses integrated in TUs outside of 
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and the proximal proviral LTR (x axis). Filled diamonds represent proviruses integrated in the same orientation as the transcription of the targeted 
gene; Open diamonds represent proviruses integrated in antisense orientation. The trendline was calculated from the distribution of proviruses 
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proviruses are integrated in unmethylated genomic DNA 
(Figure 8A, Supplementary Figure S3). In contrast, 
the silenced proviruses and active proviruses sensitive 
to the Dnmt3a/b over-expression (conditionally stable) 
are integrated in hypermethylated DNA regions 
(Figure 8B and C). The over-expression of Dnmt3a/b in 
conditionally stable clones resulted in the expansion of 
surrounding methylation patterns into the proviral 
LTR promoter (Figure 8B, Supplementary Figure S4), 
which was accompanied by the loss of provirus expression. 
However, the conditionally stable and silenced proviruses 
integrated in intergenic regions were not methylated 
with the same efficiency. The conditionally stable and 
rare silenced intra-genic proviruses were very efficiently 
methylated upon Dnmt3a/b ectopic expression. 
Dnmt3b appeared to serve as a more efficient methyla- 
tion effector than Dnmt3a, which corresponded to the 
differences in the silencing efficiency. The silenced 
proviruses in intergenic regions were methylated 
with very low efficiency (Figure 8C, Supplementary 
Figure S5). 

Based on the data, we suggest a model where 
retroviruses integrated to the close vicinity of transcrip- 
tionally active cellular promoters have the potential for 
absolutely stable expression. Outside of such regions, the 
proviruses are subjected to Dnmt3a/b-dependent 
(proviruses in transcribed genomic regions) or Dnmt3a/ 
b-independent (predominantly proviruses integrated 
outside of genes) silencing (Figure 9). 



DISCUSSION 

CpG methylation of provirus DNA and repressive histone 
methylation of associated nucleosomes are well- 
established as epigenetic mechanisms inhibiting retroviral 
expression at the level of transcription and leading to 
variegation and provirus silencing. Neither of these 
branches can satisfactorily explain all aspects of provirus 
silencing, although there are experimental settings where 
histone methyltransferases mediate silencing independ- 
ently of DNA methyltransferases and vice versa. We dem- 
onstrate that provirus silencing occurs in the context of 
flanking cellular DNA, and both activating and suppres- 
sive influences of the flanking chromatin features must be 
considered. We present the first analysis of provirus 
silencing in single-cell clones with characterized chromo- 
somal positions of proviruses. Furthermore, integration 
into genomes of cells deficient or proficient in de novo 
DNA methyltransferases provided information about 
the involvement of DNA methylation in retrovirus 
silencing at certain genomic positions. We found 
that retrovirus integration into TUs close to the TSSs 
and within the regions enriched in H3K4me3 
permitted long-term unsilenced provirus expression and 
protected the provirus regulatory sequences from CpG 
methylation even under Dnmt3a/b over-expression. 
Proviruses integrated into the transcribed parts of genes 
outside of H3K4me3 regions were silenced by DNA 
hypermethylation of LTRs, whereas proviruses inserted 
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Figure 8. DNA methylation of the provirus and adjacent host cell se- 
quences. CpG methylation status of the 5'LTRs and 300-600 bp of the 
genomic DNA upstream to the 5'LTR (not in scale) in representative 
clones of HCT116 Dnmt3a~' Dnmt3b~'~ cells with stable (A), stable 
but de novo Dnmt-sensitive (conditionally stable) (B), and silenced (C) 
provirus expression. CpG methylation was investigated by bisulfite 
sequencing after transfection of the empty vector (mock) or vectors 
expressing either Dnmt3a or Dnmt3b. CpG methylation status of the 
genomic DNA upstream to the 5'LTR is indicated only in mock- 
transfected cells. Methylated CpG dinucleotides are indicated by solid 
circles, non-methylated CpGs are indicated by open circles. Numbers 
indicate the percentages of GFP-positive cells in particular cell clones 
and the percentages of methylated CpG dinucleotides. The length of the 
integration site analyzed is indicated. 



in intergenic regions were efficiently silenced without ac- 
cumulation of methylated CpGs. 

Our analysis confirmed the significance of de novo DNA 
methylation for the retrovirus silencing because the 
absence of Dnmtl did not lead to any significant silencing 



defect and the silencing was comparable in HCT116 
Dnmt3b~ j ~ and HCT116 Dnmtl~ h Dnmt3b~'~ cell lines. 
De novo DNA methylation, however, is not inevitably ne- 
cessary for provirus silencing. Proviruses integrated in 
intergenic regions or extremely far from TSSs in long 
TUs remain silenced even in HCT116 Dnmt3cT'~ 
Dnmt3b~ ! ~ cells, and intergenic proviral insertions are 
not CpG methylated by ectopically expressed Dnmt3b 
or Dnmt3a. The comparison of provirus silencing in 
HCT116 Dnmt3b~ , ~, HCT116 Dnml3a~'~ Dnmt3b~ ! ~, 
and HCT116 Dnmtl~ ! ~ Dnmt3b~ / ~ cell lines also 
excluded the influence of overall genome methylation 
and the probability of proviral integration into densely 
methylated host cell DNA. These cell lines contain 97, 
80, and <5%, respectively, of total genomic methylation 
of wt HCT116 (51) but reached similar efficiencies of the 
provirus silencing. Because the single knock-out of 
Dnmt3a was not available, we can only speculate about 
its silencing phenotype. Dnmt3a was reported as a potent 
provirus silencer in mouse embryonic stem cells (48). 
In HCT116 Dnmt3a~ ! ~ Dnmt3b~ l ~ cells, however, the 
absence of Dnmt3a meant only a slight additional 
decrease in silencing efficiency in comparison with the 
knock-out of Dnmt3b alone and Dnmt3a scored weaker 
than Dnmt3b in silencing rescue experiments (Figure 3). 
This difference can be explained by the low Dnmt3a ex- 
pression in the wt HCT116 cell line (51), lower DNA 
methyltransferase activity of Dnmt3a in comparison 
with Dnmt3b (55), and the dependence of Dnmt3a on 
the guidance and stimulation by Dnmt3L (56), which is 
not expressed in HCT116 cells. 

The main finding of our study is that proviruses 
integrated close to the TSSs within the H3K4me3- 
enriched regions remain stably expressed and cannot be 
silenced even in cells with artificially increased expression 
of Dnmt3a or Dnmt3b. H3K4 trimethylation marks the 5' 
parts of transcriptionally active or at least poised 
genes and usually forms broader surroundings of CpG 
islands and polymerase II-enriched regions (57,58). 
Mechanistically, at least Dnmt3a was shown to prefer 
non-methylated H3K4 under the guidance by Dnmt3L 
(59) and being expelled from H3K4me3 (60). In wt 
HCT116 cells, stably expressed proviruses were integrated 
exclusively in H3K4me3-enriched regions, whereas the 
silenced proviruses were distributed in quite opposite 
way, in the rest of gene bodies and in intergenic regions 
(Figure 6). We suggest that integrations in gene bodies 
normally result in provirus silencing because of increased 
levels of H3K36me3, which recruits de novo DNA 
methyltransferases (61). However, this control is leaky in 
de novo methyltransferase-deficient cells and in HCT116 
Z)«w?5a _/ ~ Dnmt3b~ ! ~ cells, stable provirus expression is 
also permitted more distantly in the gene bodies outside 
the H3K4me3-rich regions. Accordingly, silenced 
proviruses in these cells were integrated almost exclusively 
outside of TUs and rarely scattered in distant parts of 
extremely large TUs 43^140 kb from the TSSs. We 
conclude that intergenic regions are for the most part 
non-permissive to the stable ASLV provirus expression 
and this non-permissiveness is independent of DNA 
methyltransferases. 
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Figure 9. Expression of proviruses integrated in different genomic localizations, an integrative model. Proviruses integrated close to the TSS within 
the H3K4 trimethylation region are stably expressed and insensitive to over-expression of de novo Dnmt (left). Proviruses integrated within the gene 
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pendent of de novo Dnmts (right). Provirus expression is indicated by a picture with gray and white cells. 



The exceptional character of integrations into the 
H3K4me3 regions was even underlined by ectopic 
over-expression of Dnmt3a/3 b because these proviruses 
in wt HCT116 cells kept their stability in these artificial 
conditions. The same treatment of stable clones isolated 
from HCT116 Dnmt3a~ h Dnmt3b~ ! ~ cells and HCT116 
Dnmt3b~ ! ~ cells led in most cases to the rescue of silencing 
with the exception of few resistant clones (Figure 3). The 
frequency of these clones was comparable with stable 
clones in the wt HCT116 cells and they all harbored 
proviruses integrated into the H3K4me3 regions. These 
results clearly show that the H3K4me3 environment 
permits autonomous expression of newly introduced 
DNA sequences and protects them from epigenetic 
silencing. The silencing in other genomic positions 
causes that only insertions into the H3K4me3 regions 
are observed when selection for the stable proviral expres- 
sion is applied in cells with normal de novo DNA 
methyltransferase composition. 

Stable expression of proviruses integrated close to the 
TSSs associated with CpG islands is not surprising. CpG 
islands were shown to protect adjacent promoters from 
DNA methylation (62) and this capacity has already 
been employed in design of a silencing-resistant and 
DNA methylation refractory retroviral vector (41,42). 
However, the protective effects do not extend far 
towards the bodies of active genes, which are enriched in 
H3K36me3 and DNA methylation (63,64). The efficiency 
of the silencing rescue after Dnmt3a/b over-expression 
increased with the distance from the TSS (Figure 7B). 
The functional dependence best fits the geometric distri- 
bution with variance probably produced by variable 



promoter strength, variable chromatin structure at 
exon-intron junctions, etc. The general decline of pro- 
transcriptional histone modification along the gene 
bodies was shown, e.g. for the lateral H3K79me2 (65). 
Proviruses transcribed in antisense orientation to the 
host gene tended to be more sensitive to de novo DNA 
methyltransferases and were not included in calculation 
of the trend line. This situation resembles the regulation 
of many imprinted loci, where the increase of DNA and 
H3K27 methylation and the decrease of H3K4 methyla- 
tion are guided by non-coding antisense transcripts of im- 
printing centers (66). 

Our analysis of DNA methylation found all proviruses 
unmethylated in HCT116 Dnmt3a~ ! ~ Dnmt3b~'~ cells re- 
gardless of their expression status. This is convincing 
evidence that provirus silencing can be established and 
maintained even without DNA methylation. The stably 
expressed proviruses in H3K4me3-enriched regions 
appeared to be enclosed by unmethylated CpGs and this 
hypomethylated state did not change even after 
over-expression of Dnmt 3 al 3 b, evidencing the resistance 
of H3K4me3-enriched regions (59). In striking contrast, 
proviruses integrated in gene bodies keep unmethylated 
LTRs surrounded by highly methylated DNA sequences 
in HCT1 16 Dnmt3a~ l ~ Dnmt3b~ ! ~ cells. CpG methylation 
of DNA within gene bodies must be maintained by Dnmtl 
as it survives even in the double knock-out of Dnmt3a/3 b. 
After Dnmt3a/3 b ectopic expression, LTRs of proviruses 
integrated in gene bodies adopt dense CpG methylation, 
which positively correlates with the level of provirus 
silencing. The highly efficient methylation of provirus 
DNA in actively transcribed genes implicates a model 
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where histone methyltransferase HYPB/Setd2 interacts 
with the processive RNA polymerase II and co- 
transcriptionally methylates H3K36 in the gene body 
together with proviral LTR promoters (67-69). The 
H3K36 trimethylation subsequently serves as a signal for 
de novo DNA methylation and thus provirus transcrip- 
tional silencing (61). In HCT116 Dnmt3a~ l ~ Dnmt3b~'~ 
cells, the DNA methylation cannot be adjusted to the 
local epigenetic environment. According to this hypoth- 
esis, the localization in the bodies of actively transcribed 
genes exposes the integrated provirus to repressive epigen- 
etic environment and pre-determines subsequent DNA 
methyltransferase-dependent suppression. The intergenic 
provirus insertions are silenced in all cell lines independent 
of DNA methylation, and the silencing is highly probably 
driven solely by the repressive histone marks. The flanking 
DNA sequences are almost fully methylated, but the 
density of CpGs is low in intergenic regions. Actually, 
we found two unsilenced but Dnmt3a/3 b-sensitive 
proviruses outside of TUs but close to an active gene 
terminus. Both proviruses were found to be methylated 
upon Dnmt3a/3b expression. Their proximity of 0.5 and 
1.5 kb to the gene terminus enables the read-through from 
adjacent genes, as confirmed by the ENCODE Exon 
Array data of HCT1 16 cells, and the passing transcription 
complex could start the H3K36me3-dependent DNA 
methylation. The available ChlP-seq data detect the 
RNA polymerase II and H3K36me3 modification in 
such regions. Proviruses integrated closely upstream to 
active promoters were found to be transcriptionally 
silent but were not efficiently methylated after Dnmt3a/b 
over-expression. 

In conclusion, we propose a model of the provirus tran- 
scriptional crosstalk with surrounding chromatin at the 
site of integration, where the long-term provirus expres- 
sion or gradual provirus silencing are to a great part 
pre-determined by local epigenomic features (Figure 9). 
Proviruses integrated within the H3K4me3-enriched 
regions connected with promoters of active, mostly house- 
keeping genes keep their transcription activity and cannot 
be efficiently silenced by DNA methylation. Proviruses 
integrated in the bodies of transcribed genes are silenced, 
but their silencing depends on the de novo DNA methyla- 
tion capacity of the host cell. Proviruses integrated 
in intergenic regions are strongly silenced in a DNA 
methylation-independent way. Provirus silencing is a 
general phenomenon; nevertheless, two extraordinary 
aspects of our study should be considered in the future. 
First, the speed and extent of silencing are species-specific 
and the validity of our model based on ASLV-derived 
vector should be further tested with various retroviral 
groups in different cell types. ASLVs are susceptible to 
efficient silencing and CpG methylation in mammalian 
cells (30,70-72), which together with an almost random 
integration into the host genome makes them an ideal 
model for the study of retrovirus silencing at various 
chromosomal loci. For HIV- 1 -derived lentiviral vectors, 
the provirus silencing was described as well (73,74) 
despite the complex transcriptional regulation and the 
presence of Spl sites in HIV-1 LTR. The phenomenon 
of HIV-1 persistence in transcriptionally latent state 



further underlines the importance of epigenetic silencing 
in the course of retrovirus infection (75,76). In our prelim- 
inary experiments, MLV-derived vectors in HCT116 cells 
are less susceptible to the provirus silencing (data 
not shown), probably due to their integration preference 
for TSSs (5). We assume that the epigenomic pre- 
determination of provirus silencing will be weaker for 
MLV and HIV-1 in mammalian cells and also for ASLV 
in permissive avian cells. Another aspect of our study, to 
be considered, is the early silencing occurring in the 
process of or immediately after provirus integration 
when the DNA lesion triggers an extensive chromatin 
response at the site of integration. We sorted the 
GFP-positive cells several days p.i. assuming that many 
proviruses had already been silenced at that time. The 
proportion of ab initio silenced proviruses cannot be 
determined in our experimental setup, but it was previ- 
ously estimated to be ca. 80% for HIV-1 -based vectors 
in human T cells (77). Our findings provide a valuable 
contribution to the retrovirus-mediated gene therapy con- 
cerning the efficiency, long-term effects, and safety issues 
of retrovirus integration. It becomes clear that efficient 
retroviral vectors for gene transfer require specific protect- 
ive modifications averting the mostly repressive influence 
of the surrounding chromatin. 
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